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Abstract 



We investigate the discrepancy principle for choosing smoothing parameters for kernel 
density estimation. The method is based on the distance between the empirical and estimated 
distribution functions. We prove some new positive and negative results on Li-consistency 
of kernel estimators with bandwidths chosen using the discrepancy principle. Consistency 
crucially depends on a rather weak Holder condition on the distribution function. We also 
unify and extend previous results on the behavior of the chosen bandwidth under more strict 
smoothness assumptions. Furthermore, we compare the discrepancy principle to standard 
methods in a simulation study. Surprisingly, some of the proposals work reasonably well over 
a large set of different densities and sample sizes, and the performance of the methods at least 
up to n = 2500 can be quite different from their asymptotic behavior. 

1 Introduction 

We investigate the discrepancy principle, a simple method for choosing the bandwidth in kernel 
density estimation which - unlike most other methods like cross-validation or plug-in estimates - 
does not directly aim at minimizing the risk. 

In the following, let Xi , . . . , X n denote iid random variables having a distribution with Lebesgue 
density / and distribution function F. We denote the empirical distribution function by F n . 

A function K : R — ► R is called a kernel of order i for i e N, if u j K(u) e L x (R) for j = 0, . . . , i 
and 



For a kernel K and h > we define Kh(u) := h~ 1 K(h~ 1 u). We denote the distribution 
function associated with K (which is not necessarily monotone if K is not a probability density) 
by K. For iid random variables X\, . . . , X n , a kernel K of order £ £ N and a bandwidth h > the 





U = 0) 

{j = !,...,£- 1) . 



function 



f h (x) given by 



n 
1=1 



v 7 z— 1 
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is called the kernel density estimator. A corresponding kernel estimator of the distribution function 
is given by 



h(t)dt 



n 

1=1 



x - Xs 



= (F n * K h ){x). 



(1) 



More important than the choice of the kernel is the choice of the bandwidth h. Depending 
on the risk function and on additional assumptions on /, often an explicit expression for the 
(at least asymptotically) optimal value can be derived. However, it necessarily depends on some 
functionals of the unknown true density /. Most parameter choice strategies used in practice 
aim at minimizing the risk. In contrast, the strategies considered here are based on a measure of 
distance between the empirical and estimated distribution functions, i.e. a direct comparison of 
the estimate with the data. 

In the following, by the discrepancy principle for choosing the bandwidth for kernel density 
estimators we mean that h is chosen such that 



d(F n ,F*) = s(n). 



(2) 



The threshold function s : N — > K + depends on n only and fulfills s(n) = o(l) for n — > oo. For 
the distance d between distribution functions we will always take the Kolmogorov or (generalized) 
Kuiper distances although, in principle, other metrics could be used. The different suggestions in 
the previous literature differ in their choices of s(n) and d and possibly in their prescriptions for 
the selection of a solution of ^ in case there are multiple solutions. 

The discrepancy principle was first introduced by Morozov ( 1966 ) in the context of (determin- 
istic) inverse problem theory, where it is one of the most widely known methods for choosing a 
regularization parameter. In Statistical Learning Theory, the connection between nonparametric 
statistics and ill-posed problems is strongly emphasized, and already in the seventies density esti- 
mation was recognized as being closely related to the problem of numerical differentiation, which 
is an ill-posed problem. Methods adapted from deterministic inverse problem theory as well as 
using the discrepancy principle for choosing their smoothing parameters have been suggested by 
Vapnik and Stefanyukl (|1978|) andlAidu and Vapnikl (|1989|), see also Chapter 7 in IVapnikl (119981 



and Chapter 7 in Vapnik| (|2000j for detailed accounts. 

Variants of the discrepancy principle (but under different names) have also independently 
been proposed in the context of the so-called Data Features or Data Approximation approach 
(Davies 1995 20081 which has its roots in robust statistics and exploratory data analysis. The 
main idea is to choose the simplest estimate (with simplicity e.g. measured by smoothness) that 
is sufficiently close to the data. Several procedures for density estimation based on these ideas 



have been proposed, including methods based on kernel density estimators (jDavies 1995), regular 



histograms ( Davies et al. 2009 ) and the taut-string estimator ( Davies and Kovac 2004 ) 



The discrepancy principle has also been used in a few other approaches to density estimation. 
Eggcrmont and LaRiccia ( |1996[ ) suggest a version for kernel density estimation that chooses a 
bandwidth of the optimal order under standard assumptions; see also Eggcrmont and LaRiccia 
Ch. 7.6). The same authors also use their method for choosing 

Ch. 



(2001 



in a pcnalizcd-likclihoo d approach (jEggermont and LaRiccia 2001 



deconvolution method (Eggermont and LaRiccia 1997). 



a penalty parameter 
7.7) and in a density 



The different variants of the discrepancy principle for density estimation mentioned above have 
largely been suggested independently of each other, and to our knowledge, there has never been a 
systematic investigation of this approach. 

In Section 2, we show that a solution of ^ exists under very weak conditions, and we show 
that the almost sure Li-consistency of the resulting kernel density estimate mainly depends on 
a rather mild Holder condition on the distribution function F. This condition is, for example, 
fulfilled for all square-integrable densities provided that the threshold function s decays slowly 
enough. We also give sufficient conditions for the resulting estimator to be inconsistent. In 
Section 3 we extend and unify some known results on the exact order of the chosen bandwidth. 
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Furthermore, we compare different versions of the discrepancy principle with standard methods 
of smoothing parameter selection in a simulation study (Section 4). The methods can behave 
quite differently to what is predicted by the asymptotic results even for sample sizes up to at 
least n = 2500. This is not so much of a surprise as the asymptotics are mostly based on the 
law of the iterated logarithm for the empirical distribution function. Indeed, some versions of 
the discrepancy principle that were previously suggested in the literature perform reasonably well 
over a wide range of different densities, while others suffer from oversmoothing for these sample 
sizes, although they are guaranteed to undersmooth asymptotically. The last section contains 
some concluding remarks. 



2 Existence and consistency 

First, we investigate the existence of a solution of |2]). We measure the distance between two 
distribution functions F and G either by the Kolmogorov distance 

d x (F,G) := ||F-G|U 



or by the k-th order Kuiper distance (for k € N) first introduced in |Davies and Kovac (2004) and 
defined by 



&kuip,k (F,G) :- 



sup 

ai<&i<a2<fo2<-"<afc<&/ 



k 

E 



|(F(6,)-F(a l ))-(G(6 J )-G(a i ))|. 



For a continuous probability distribution function F and the empirical distribution F n of a sample 
of size n drawn from F, the distributions of doo(F n ,F) and dkuip,k(F ni F) do not depend on F. 
For k = 1 we obtain the usual Kuiper distance. All these distances are topologically equivalent 
and it is easy to see that 

(F,G)< (F,G)<2kd OQ (F,G). 
In the following, we always have d — doo or d — dkuip,k for some k G N, and we define 



d = doo 

d dkuip,k 



It should be noted that, since we allow for higher order kernels, some distribution functions do 
not correspond to probability measures but to signed measures. 

For a kernel K with associated distribution function K, we define 

K ■= sup \K(x) - F (x)\, 



where Fq(x) := l(x > 0) is the distribution function of the Dirac measure in 0. In case if is a 
probability density, we have kq = max{K(0), 1 — K(0)}. If K is also symmetric around zero, he 
have k q = K(0) = 1/2. 

The following lemma shows that, almost surely, for fixed n, the function h — > d(F n ,F^) is 
continuous and must - under weak conditions on s - take the value s(n) for at least one h if n is 
large enough. An analogous statement has been proved by Eggermont and LaRiccia ( 1996 2001[ ) 
for the special case of a symmetric, nonnegative kernel of of order 2 and d = doo. The proof can 
be found in |Mildenberger] ( |2011[ ), pp. 27-28. 

Lemma 2.1. For F n an empirical distribution function of an iid sample from a distribution with 
continuous distribution function and F^ as in uy we have almost surely: 

1. d(F n ,F^) is continuous in h. 
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Figure 1: Solutions of d 00 (F^,F n ) = s(n) using a standard Gaussian kernel for X\, . . . ,X n ~ 
7V(0, 1). Top row: n = 10, bottom row: n = 100. Straight lines: s(n) = 0.6n -1 / 2 , broken lines: 
s(n) = 0.35n- 2 / 5 . 



2. ]immi h ^ d{F n ,F%)<c d ^ 

3. limsup /l ^ 00 d(F Il ,F^) > k . 



Lemma 



2.1 



shows that if s(n) — o(l) and rT 1 = o(s(n)) the equation d(F n ,F%) — s(n) almost 
surely has at least one solution h s ^ n for sufficiently large n. These conditions are fulfilled by the 
threshold functions previously proposed in the literature. Moreover, the minimum sample size 
that guarantees existence of at least one solution can be calculated explicitly since it depends on 
s(n) and kq only, and not on the sample or on the underlying true distribution (assuming t here are 
no tie s, which holds true almost surely). For example, if s(n) = Q.&nT 1 / 2 as proposed by Vapnik 



(1998 Ch. 7.9) or s{n) — 0.35n~ 2 / 5 as proposed in |Eggermont and LaRiccia| (119961), d = d^ 
and K is any symmetric probability density, we have that s(n) £ ^] for n > 2, so existence 
of the bandwidth can be guara nteed if there are at least two data points. As already noted by 
Eggermont and LaRiccia (1996), the function h — > d(F n , F%) is not necessarily monotone, so that 



the bandwidth chosen according to the discrepancy principle is not necessarily unique; j Eggermont 



and LaRiccia ( 1996 ) suggest using the smallest solution while the Data Approximation approach 



would suggest using the largest one. However, none of the results given subsequently depends on 
the particular choice of the solution, and multiple solutions seem to occur only rarely in larger 
samples. 

Figure [l] shows shows two realizations each for n = 10 and n — 100. The samples were drawn 
from a standard normal distribution and the Gaussian kernel was used. The horizontal lines 
correspond to the two different choices of the threshold functions mentioned above. The solution 
d(F n ,F%) = s(n) can be computed numerically since the function h — > d(F n ,F%) is continuous. 
In Eggermont and LaRiccia (1996), a secant method is proposed for solving this equation, but 
we use the related regula falsi which we found to be more stable. The possibility of using an 
iterative method makes selection of the bandwidth using the discrepancy quite fast in comparison 
to other methods (like cross-validation) where one usually has to evaluate some criterion on a grid 
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of possible bandwidths. In addition, well-known formulas exist for calculating Kolmogorov- and 
Kuiper-distances (for k = 1) for two distribution functions and these can be applied if K is a 
probability density. 

In the following, we frequently need the function 

F h :=F*K h . 



In case Kh is a probability density, Fh is a probability distribution function, otherwise it is the 
distribution function of a signed measure. 

The proof of the following Lemma is based on basic properties of convolutions and the Law of 



the Iterated Logarithm, see Mildenberger (2011), p. 29, for details 



Lemma 2.2. With probability 1, 

1. d{F n ,F)=0({\og\ogn/n) 1 / 2 ) and 

2. d(F n l ,F h ) = O ((loglogrc/n) 1 / 2 ) uniformly in h. 

The next theorem shows that bandwidths chosen using the discrepancy principle converge to 
almost surely. This result will be needed later on for obtaining more precise statements about 
the behavior of the selected bandwidths. At this point, F must be continuous but does not need 
to have a density. As a by-product, the theorem also shows that the resulting estimator for the 
distribution function is always consistent w.r.t. d, although our aim is to estimate the density 
rather than the distribution function. The proof of the second assertion is based on similar Fourier 



arguments as the proof of Theorem 3 in Yamamoto (1973). 



Theorem 2.1. Let F be a continuous distribution function, F n and F^ as above and s(n) = o(l) 
For the bandwidth h s n chosen as a solution of 



d(F n ,F%) = s(n). 



we have almost surely 

--ill 



and 



1. d(F, Fn'") 

2. h s , n — > 0. 

Proof. 1. With probability 1, we have: 

d(F, F hs n ) < d(F, F n ) + d(F n , F%°> n ) + d{F^« , F hs n ) 

= O ((loglog7i/n) 1/2 ) + s(n) + O ((log log n/n) 1/2 
= o(l), 



and hence 



d{F,Ft-)<d{F,F h 



d{F h 



')=«(!)• 



2. According to the first part, d ao (F, F hs n ) < d(F,F h 
show that this implies h s ^ n 0. In the following h n : 



,) 



with probability 1; it remains to 
h s ^ n denotes the sequence of bandwidths 
chosen, P denotes the probability measure associated with F and the (signed) measure with 
Lebesgue density Kh n . Denote by P, K and Kh„ the Fourier transforms of P, K and Kh n , 
respectively. Observing that the sequence (|P* \ihS)n&i is tight (Mildenberger 2011 pp. 30-31) 



and c ombining Proposition 8.1.8 in Bogachev (2007) with a result on page 173/174 in Katznelson 
(2004), it follows that PK hn (t) — > P(t) for all tel. Because of the continuity of the Fourier 
transform, we must have P > on an interval [—£, e] for some e > 0, which implies that Kh n (t) = 
K(h n t) — > 1 for all t £ [—£,£]. Since J u l K(u)du ^ 0, K cannot be identically 1 on any interval 
around zero. But this implies that h n — > 0. □ 
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While the previous results hold for any continuous distribution function F, for the remainder 
of the paper we suppose that a Lebesgue density / exists. 

For consistency of the kernel density estimate with bandwidth chosen by the discrepancy 
principle, we also need that the chosen bandwidth does not go to too quickly. This can be 
guaranteed under rather mild conditions. For < a < 1, let 



C°' Q := <^ F 



F 



and 3C > with sup \F(x) - F(y)\/\x - y\ a < C 

x,y£R 



denote the set of all Holder continuous functions with exponent a. Smoothness of the distribution 
function follows from integrability assumptions on the density. We define for p E [1, oo) 



/ 



and ll/Hp 



i/p 

\f\Pd\ \ x 



where A denotes the Lebesgue-Measure on (the Borel sets of) K. Then we have: 

Lemma 2.3. Let f denote a probability density and F the corresponding distribution function. 
For p G (1, oo) we have 

f € L p (R) F e C°' (p - 1)/p . 
Proof. For any i<j/£l and q = we obtain using the Holder inequality 



\F(y)-F(x)\ < \\f\\ p (y-x) 



and hence F € C '^" 1 )/?. 



□ 



This implies for example that for any square-integrable density (i.e., / € L 2 (M.)) F is Holder- 
continuous with exponent a = ^. We also observe that, using a similar argument, for any bounded 
/ the corresponding distribution function F is Holder-continuous with exponent a = 1. 

The next theorem shows that L\ consistency of a kernel density estimator with bandwidth cho- 
sen by the discrepancy principle can be guaranteed if the distribution function is Holder continuous 
with an sufficiently large exponent and the threshold function goes to slowly enough. 

Theorem 2.2. Let K be a kernel of order £, £ > 1, and f a density with associated distribution 
function F such that F G C°' a for some < a < 1. // the threshold function s(n) is such that 
log log n _ Q^g^ n y^ an d n a s (nj — > oo for n — > oo, then with probability 1 we have that 

nh Stn — > oo. 

Proof. The Holder condition F € C°' a implies that there is a constant A > such that d o(F', Fh) < 
Ah a , cf. |Shapiro| ( |l969| , Theorem 20. With probability 1, we have that 

n a s(n)=n a d(F n ,F^) 

< c d n a (d 00 {F n ,F) + d 00 {F,F hein ) + d 00 {F hen ,F^)^ 

< Acd^h^ + ^Odloglogn/n) 1 / 2 ) 

which implies that 

Ac d n a h a s n > n a s{n){l + o{l)), 

and hence, since n a s(n) — > oo, that nh 8 ^ n — > oo. □ 

Corollary 2.1. Lf K is a probability density and f and s are such that the conditions of Theorem 
\2.S\ are fulfilled, we have 

lim f \f han (x)- f(x)\dx = 

with probability 1. 



G 



Proof. Under the stated conditions, part two of Theorem |2 . 1 1 and Theorem |2.2| yield that almost 



surely h s n — > and nh s n — > oo, which by Theorem 1 in Chapter 6 of Devroye and Gyorfi 



( 1985 ) implies lim n _ J . 00 J \ff la n (x) — f(x)\dx = almost surely. □ 



From Corollary |2.1( we have that almost s ure Lj-consistency can be guarant eed for the thresh- 
old function s(n) = 0.35n -2 / 5 suggested in |Eggermont and LaRiccia (1996), K a probability 
density and / G L^. 

Although the conditions for consistency are rather weak, the resulting density estimate may 
be inconsistent if the distribution function is too rough or the threshold function vanishes too 
quickly: 

Theorem 2.3. Let K be a kernel and < e < 1/2 such that n £ s(n) = o(l). Let F„ denote 
the empirical distribution function of an iid sample drawn from a distribution with density f and 
distribution function F. Suppose there exist constants c, ho > such 

dooi^Fh) >ch £ 

for all < h < ho. Then, if h Sy1l is a solution of d(F n , F^) = s(n), we have: 

1. nh s>n — > with probability 1 and 

2. if K is compactly supported and there exist a, b > such that \{x : f{x) > b} > a, where A 
denotes Lebesgue measure on R, then limmfn^oo \\fh s — > a b > with probability 1. 

Proof. 1. It follows that, with probability 1, 

crfh\ n < n'dooiF^F^ J 

< n 6 (doo(F,F„) + d(F n ,F^) + rfoo(^, n ,^' n )) 

= 7i e O((loglogn/n) 1/2 ) +n £ s(n) 
= o(l), 

and hence nh s n = o(l). 

2. If the support of K is contained within a compact interval /, then, since \{Kh s ^ 0} < A(I), 
we have almost surely 

HK„ + 0} < 2nh s<n \{I) = o(l) 
because of the first assertion. It then follows almost surely that 

liminf / \fh (x) — f(x)\dx > liminf / f(x)dx>ab. 
,wo ° J ' n ^°° ^{/>b}n{A s ,„=o} 

□ 

In the following example, we consider a family of densities with an infinite peak and see that 
the using the discrepancy principle can lead to consistent or inconsistent estimates depending on 
the sharpness of the peak: 

Example 2.1. Let 

K(x) = (3/4)(l-x 2 )I(\x\<l) (3) 

denote the Epanechnikov kernel and choose s(n). Consider the distribution of X := for 
(3 E [1, oo), where U is uniformly distributed on [0, 1]. With e = the density of X is given by 

tt ^ f^" (1_£) 0<a;< 1 

f(x) := < . (4) 

otherwise 
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The distribution function of X is given by 




F(x) := { .r : n : r : . l . (',) 

It is easy to see that F £ C°' a iff a < e. 

First consider the case that y log '° s " = o(s(n)) and n £ s(n) —> oo. Then the conditions of 
Theorem |2.2| are fulfilled and the estimator will be consistent w.r.t Li-distance. Note that if 
log log n _ ( s ( n )) then we trivially have n £ s(n) — > oo for all e > 1/2. 

Now consider the case that < e < 1/2 and n e s(n) = o(l). 
By elementary integration, we obtain 



|F,/„-«r: /, /( w/m| = (.-^^)/, 



= :c>0 

for ft, < 1 and hence 

doo{F,F h ) >ch e 

for ft < 1 =: fto. Since X is compactly supported, inconsistency w.r.t. the L\ distance directly 
follows from the second assertion of Theorem 12.31 



3 Rates for the bandwidths 



In the following, we consider threshold functions s(n) that go to at different speeds: 
• s(n) — o ((loglogrt/n) 1 / 2 ) (Theorem 



3.11, 



3.2| 



s(n) x (loglogn/n) 1 / 2 (Theorem 
(loglogn/n) 1 / 2 = o(s(n)) (Theorem 



3.31 



The versions of the discrepancy principle for kernel estimators previously proposed in the 
literature can be obtained by choosing a threshold function that belongs to one of these classes. 

To obtain more precise statements about the order of the chosen bandwidth, we need some 
additional assumptions of / and K. In this section, we suppose that / is in a Sobolev space defined 

by 

W 1 ' 1 :={/:/,/ (1) ,..,/ (£) eii(K)}. 

where I > 2 is the order of the Kernel K. 

The following Lemma is a slight generalization of a similar result by[Eggcrmont and LaRiccia 
( 1996 2001 ), who only considered nonnegative symmetric kernels of order I = 2 and d = d^. The 



proof is left out since the first part is completely analogous to Eggermont and LaRiccia (2001 
Lemma 6.15 a) and the second part is easy. 

Lemma 3.1. Suppose that f € II-^'^IR) and K is a kernel of order I > 2. Then we have: 

1. F h (x) - F(x) = { ^k i f^- 1 \x)h l {\ + o(l)) uniformly mxeR. 

2. d(F h ,F) = hkidtfV-V, 0)h e (l + o(l)). 

The approximations given in Lemma |3.1| are only valid for sufficiently small ft. Since by 

y almost surely with ft Sj „ chosen by the discrepancy 



Theorem 2.1 for n — >• oo we have that h s 



principle, terms of order o(l) for ft — > are also of order o(l) for n — > oo. 
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The most simple and intuitive implementation of the discrepancy principle is based on a 
goodness-of-fit test for a fixed level independent of n: the Data Approximation approach pre- 
scribes that for a given data set, one should choose the simplest model that could have generated 
20081. For kernel density estimation, this results in the discrepancy principle 

x / 2 with c chosen as an appropriate quantile 



the data (Davies 
(|2 



with d 



or d = dkuip.k and s(n) = erf 
of y/nd(F n , F). Generally, the Data Approximation approach seems to suggest using extreme 
quantiles (95%, 99%). In Example 10 of Davies| (1995), a discrepancy principle based on the 
98%-quantile of the Kuiper distance is used, which is then combined with a further criterion, the 
so-called extreme value feature. In contrast, (for the Kolmogorov distance) Vapnik (2000 Ch. 
7.5.1) suggests to use the median or even the mode, which is approximately located at 0.74. Using 
c = 0.6 is suggested inlVapnikl (1 19981, iMarkovichl (|1989|) suggests c = 0.7 or c = 0.5. With c = 0.6 



, the estimated distribution function is required to lie in a 14% confidence band. However, c has 
no effect on the rate with which h s „ converges to 0: 



Theorem 3.1. For f e W , K Kernel of order £ and s(n) = O 



log log n 



0{n 21 (log log n) 1 



almost surely. 



Proof. According to Lemma |3.1[ we have a.s 



lh\\f( i -V\\ 00 h S! /(l + o(l)) = d 00 (F hstn ,F) 



< d OQ (F, F n ) + d(F n , + d^ (F^ s - n ,F hs n ) 

= o((loglogn/n) 1 / 2 ) +s(n) 

= o((loglogn/n) 1 / 2 ) . 

The second term in parentheses on the left-hand side is not only of order o(l) for h s , n ~^ 0, but 



also o(l) for n — > oo since, by Theorem 2.1 
h s n then proves the claim. 



oo almost surely implies h s 



0. Solving for 
□ 



Theorem 



3.1 



shows that for / 6 W 1 ' 1 and K kernel of order £, an upper bound for the 
bandwidth (and hence the bandwidth itself) converges to at a faster rate than the optimal 
bandwidths according to most criteria, which behave like h x n~ 2e + 1 (although this problem 
becomes less severe as £ increases). The reason is that density estimation is an ill-posed problem 
that requires regularization. For sufficiently large n, the Kolmogorov-Smirnov-test with fixed 
level will detect the difference between F and Fh, even if h is chosen optimally. This leads 
to a bandwidth that is too small. The incompatibility of optimal bandwidths with confidence 



sets based on the Kolmogorov-Smirnov or Kuiper tests has also been observed in Davies (1995) 



Eggermont and LaRiccia (19961 and Hjort and Walker (2001). Asymptotically, the estimated 



distribution function is too close to the empirical distribution function, leading to undersmoothing. 
However, the simulations in Section 4 show that discrepancy principles based on extreme quantiles 
of goodness-of-fit tests still oversmooth even for sample sizes as large as n — 2500, while the version 
proposed by Vap nik (c = 0.6) works quite well for the sample sizes considered. 



Theorem |3.l| is applicable to threshold functions of the form s(ri) 



log log n 
2n ' 



but more 

precise results are possible when c is large enough. A threshold of this form is motivated by the 
law of the iterated logarithm for d(F n ,F), and is in a sense the closest analogue to the upper 



bound on the error in deterministic inverse problems. Aidu and Vapnik ( 1989) considered the case 
where c = (1 + k + e) for kernels K of order £, k = \\K ji and d = doo. The next theorem is a 



slight extension of their theorem (Aidu and Vapnik 1989 Sec. 3) that now additionally includes 
the case of d = dk U ip,k and has essentially the same proof, see pp. 38 in Mildenberger (2011) for 
details. 
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Theorem 3.2. For f € W*' 1 , K kernel of order £ > 2, k = \\K\\i and s(n) = c d (k + 1 

1/2 

e) (loglogn/2n) , we have with probability 1: 



n-ico (loglogn/2n) 1 / 2£ 
h 



> 



c d el\ 



lim sup 



(loglogn^n) 1 ^ 



< 



' ' c d (2k + 2 + e)£\ 
kedifV-V^) 



The theorem gives an upper and a lower bound on the selected bandwidth which are of the 
same order, and which again go to faster than the optimal bandwidths according to most criteria. 

Exact results on the limiting behavior of the bandwidth chosen by the discrepancy principle 
can be obtained in the case where s(n) converges to at a slower rate than d(F n ,F). Noting 
that discrepancy principles based on fixed quantiles or the law of the iterated logarithm lead to 



undcrsmoothing, Eggcrmont and LaRiccia ( 1996 2001 ) introduce a rate-corrected version. For a 



symmetric, nonnegative kernel, they propose to choose h as a solution of 

doc(F n ,F^) =0.35n~ 2 / 5 . 
The choice of the exponent implies that the smoothing parameter goes to at the optimal rate. 



The next theorem is a generalization of the main result in Eggermont and LaRiccia (1996) and 
ipt 

dh 



Chapter 7.6 of Eggermont and LaRiccia (2001). Our version is also applicable in the case of 

d: 



l kuip,k an d allows for higher order kernels. 

Theorem 3.3. For f £ W 1 ' 1 , K kernel of order £ and s(n) 
we have almost surely: 



cn 



for c > and < 7 < 1/2 



•As 7) 



(1 + o(l)) 



,M(/(^-D,0), 

Proof. Using the triangle inequality, we have with probability 1 that 



(6) 



\d(F n ,F^") - d(F,F hs J\ < d(F h3>n ,F^) + d(F,F n ) = O 



Combining this with Lemma 3.1 and again observing that, by Theorem 2.1 the o(l) term for 
h s ,n — )■ is also of order o(l) for n — > 00, we have 




j l k t d{f«- 1 \0)h. tn t {l + o(l)) = cn-^ + O ((loglogn/n) 1 ^ 



which implies that 



c£l 



M/^ _1) ,o) 



n"7 (1 + o(l)) 



□ 



The theorem implies that for a kernel of order £ and a threshold function of the form s(n) = 
cn^ 1 with 7 = £/(2£ + 1) the chosen bandwidth is - for sufficiently smooth / - of the optimal 
order h — an _1,, ' 2 ' +1 ' with respect to the L\ or L 2 risks. The constant a depends on c and the 
unknown true density / and is not equal to the optimal one according to any of the standard 
criteria. Eggermont and LaRiccia choose c = 0.35 based on simulations. Noting that s(n) = 
cn -2/5 _ (ctj 1 / 10 )^- 1 / 2 we can interpret the threshold function in terms of confidence levels that 
depend on n. For c = 0.35, the confidence level is below 0.5 up to n — 5624. 

In principle, constants suitable for other classes of densities, other distances or higher order 
kernels can also be chosen using simulations. But Theorem [33] also allows for a different approach: 
Discrepancy principles that can be guaranteed to asymptotically choose the optimal bandwidths 
for a reference density. In the following example, we will sketch this approach for the normal 
distribution and the i^-optimal bandwidth. 
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Example 3.1. The asymptotically Z^-optimal bandwidth for a kernel of order i is given by 



( &\\ 



K\\\ 



L opt 



\2lti\\fM\\ 



n 2e + 1 . 



(7) 



(Wand and Jones 1995 p. 33). Equating (JsJ) and |7| yields 



2i+l 



and 



\K\\fh 

{2iyi\ 



(8) 



The first factor only depends on the kernel and is invariant w.r.t. rescaling of the kernel. The 
second factor only depends on the shape of / and does not change when / is translated or rescaled. 
For / the standard normal density, we obtain c = 0.1357 for the Gaussian and c = 0.1331 for the 
Epancchniko v kernel (|3| when using d = doc,. Both values are much smaller than c = 0.35 as 
suggested by Eggermont and LaRiccia ( 1996 1 independent of the kernel. This choice forces the 



estimate of the distribution function to lie extremely close to the empirical distribution function, 
causing severe undersmoothing even for very large sample sizes. For d = dkuip,ii we obtain 
c = 0.2715 for the Gaussian and c = 0.2661 for the Epanechnikov kernel. Similar calculations can 
be carried out for a bandwidth minimizing an upper bound for the L\ risk, but these lead to even 



smaller values of c ( Mildenberger 2011 pp. 44-45). 



4 Simulation Study 

In this section, we explore how well different versions of the discrepancy principle work in practice. 
Mainly in the 1980s and 1990s, several large simulations studies on bandwidth choice methods 



for kernel density estimators have been conducted, of which we just mention Cao et al. ( 1994 1 



(with a focus on the L2-risk) and Berlinet and Devroye (1994) and Devroye (1997) in an L\- 
context. To the best of our knowledge, there is no larger simulation study on kernel estimators 



that includes any version of the discrepancy principle, although in Devroye (1997) the version 



11 



proposed in Eggermont and LaRiccia ( 1996 1 is mentioned but not included in the study. There 



are some smaller simulation studies to be found in the publications in which a particular version of 



the discrepancy principle is suggested or directly building on these ( |Markovich| |1989[ Eggermont 



and LaRiccia 1996 2001 1 



study described in Mildenberger (2011) 



Our simulation study is a replication of a part of the more extensive 
The aim is not to find a 'best' method but to explore 

Since the 
and 



whether methods based on the discrepancy principle perform reasonably well at all 
discrepancy principle is not designed with any specific risk in mind, we look at both the Li 
L 2 -risk (where applicable). 

We use the Epane chnikov kernel as given in . (|3| a nd choose the bandwidth as a solution of 
d(F n ,F%) — s(n). In Eggermont and LaRiccia (19961, a secant method is proposed for solving 
this equation, but we use the related regula falsi which we found to be more stable. Occasionally, 
there may be multiple solutions but we ignore this and take the first solution found. 

We compare the following versions of the discrepancy principle: 

• Two versions based on the 0.5 and 0.95 quantiles of the Kolmogorov-Smirnov statistic: 
d = doo and s(n) = crT 1 / 2 with c = 0.83 and c = 1.36. These methods are denoted by KS 
.5 and KS .95, respectively. 

• The version proposed by Vapnik: d = doo and s(n) = 0.6n -1 ' 2 . Denoted by V. 

• The rate-corrected version proposed by Eggermont and LaRiccia: d — doo and s(n) = 
0.35n -2 / 5 . Denoted by E-LR. In contrast to the other versions considered here, this one 
uses a threshold function for which the assumptions in Theorem |2.2| are fulfilled. 



• Two versions based on 0.5 and 0.95 quantiles of the Kuiper statistic: d — dkuip,l and 
s(n) = cnT x l 2 with c = 1.22 and c = 1.75. Denoted by Kuip .5 and Kuip .95, respectively. 

• The method based on a normal reference density as given in Example |3.1| d = doo and 
s(n) = 0.1331n~ 2 / 5 . Denoted by L2NR. 



For comparison, we include L 2 cross-validation as described in Celisse and Robin ( 2008|) (their 
Formula 13 with p = 1). This is denoted by L2CV. The more extensive simulations in Milden- 



berger (20111 include several more variants of the discrepancy principle, a few more standard 
methods for comparison, and all of the 28 densities from Berlinet and Devroye ( 1994 ) . For the 
sake of brevity, here we just focus on a smaller subset but the conclusions are largely the same. 
We draw 250 samples of sizes 100, 1000 and 2500 from 12 of the 28 test bed densities introduced 



in 


Berlinet and Devroye 


( 


1994). 


2012 




Mildenberger and Weinert 



For this, we use the R-package benchden (Mildenberger et al 



same numbering for the densities as in Berlinet and Devroye ( 1994 1 



Figure [3] shows typical kernel estimates for a normal sample of size 100. In the first panel, 
the L2-optimal bandwidth Q was chosen. The second panel shows the result obtained using V, 
which gives a fairly good result. The bandwidth chosen using KS .95 is obviously too large, 
although it will be too small asymptotically. The bandwidth in the fourth panel has been chosen 
using L2NR. Although this will asymptotically result in the optimal bandwidth, the estimate is 
severely undersmoothed. 

The estimated L\ and (squared) L 2 risks and the arithmetic means of the chosen bandwidths 
for all densities and sample sizes considered here are given in Tables [1] [2] and [3| respectively. The 
smallest risk for each scenario has been highlighted. Note that Table 2] omits densities 8 and 19, 
since these are not in L 2 . 

In most cases, either L2CV or one of V and E-LR, which perform very similarly, is the best 
method with respect to the L2-risk. Although L2CV usually selects smaller bandwidths than V 
and E-LR, the resulting risks are close in most cases. The methods based on quantiles of the 
Kolmogorov or Kuiper statistics (KS .5, KS .95, Kuip .5 and Kuip .95) choose larger band- 



widths, which results in oversmoothing in most cases (although, by Theorem 3.1 these methods 
asymptotically suffer from undersmoothing) . The methods based on the Kuiper statistic are usu- 
ally better than those based on the corresponding quantiles of the Kolmogorov-Smirnov statistic. 



12 



L2opt 



V 




-4 -2 2 4 -4 -2 2 




-4 -2 2 4 -4 -2 2 



Figure 3: Kernel estimates for a normal sample (n — 100). The bandwidth used are the L 2 -optimal 
choice (l7| (L2opt) and three variants of the discrepancy principle. 



The rather large amount of smoothing chosen by these methods is beneficial w.r.t. L\ risk for the 
Cauchy density (number 6), which is mainly due to the fact that the L\ loss penalizes errors in 
the tails quite heavily. 

The method L2NR chooses bandwidths that are much smaller than those chosen by the other 
methods. Although it is guaranteed to asymptotically choose the L 2 optimal bandwidth for the 
normal density, the results for both L\ and L 2 risk are poor even when the true density is the 
normal (number 11). The small bandwidths seem to be helpful for capturing the fine structure of 
multimodal densities 23, 27 and (less pronounced) 24. 

Except for 8, 15 and 19 all densities are bounded and hence fulfill the assumptions of Theorem 



2.2 such that the estimate based on a bandwidth chosen using E-LR will be almost surely 
consistent w.r.t. the Li-distance. 

Density 8 is the density of U 2 , where U is a uniform random variable on [0, 1]. This corresponds 



to the density in Example 2.1 for e = 1/2, such that using E-LR to select the bandwidth will 
result in a consistent estimate w.r.t. Li-loss. The density is not in L 2 . With respect to Li-risk, at 
least for larger sample sizes all versions of the discrepancy principle except L2NR perform better 
than L2CV. The distribution function corresponding to density 15 is in C 0,a for any a < 1, and 
hence using E-LR to select the bandwidth will also lead to Li-consistent estimates by Theorem 
|2.2| and Corollary |2.1| This density is in L 2 . Again, L2CV performs worse than all variants of 
the discrepancy principle except L2NR. Density number 19 is the density of N 3 , where N is a 
standard normal random variable. It is not in L 2 and it can be shown that 

\F(h)-F*K h (h)\ = ^L/^(l + (l)), 
where F is the distribution function and K u the Epanechnikov-kernel with bandwidth h. Hence, for 



h small enough, the conditions of Theorem 2.3 are fulfilled with e = 1/3 and any < c < (y^ir)^ 1 . 
From this it follows that every version of the discrepancy principle considered in the simulation 
study will lead to inconsistent estimates w.r.t. Li-loss almost surely. The main Theorem in 
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Density 


n 


L2CV 


E-LR 


V 


KS .5 


KS .95 


Kuip .5 


Kuip .95 


L2NR 


1 


100 

1000 

2500 


0.2518 
0.1043 
0.0774 


0.2421 

0.1095 
0.0775 


0.2511 

0.1035 

0.0734 


0.3018 
0.1204 
0.0803 


0.4735 
0.1831 
0.1131 


0.2884 
0.1086 
0.0744 


0.3667 
0.1341 
0.085 


0.3871 
0.1132 
0.0819 


6 


100 

1000 

2500 


0.3316 
0.1637 
0.1242 


0.3324 

0.158 

0.1247 


0.3293 

0.1581 
0.1248 


0.3583 
0.1627 
0.1258 


0.5169 
0.2097 
0.1493 


0.3442 

0.1576 

0.1237 


0.4163 
0.1714 
0.1294 


0.6787 

0.23 

0.1547 


8 


100 

1000 

2500 


0.3784 
0.3584 
0.3576 


0.3892 
0.2679 
0.2361 


0.3741 
0.3021 
0.2903 


0.3521 

0.2365 
0.2214 


0.4426 

0.1892 

0.1608 


0.3608 
0.2222 
0.2047 


0.4368 
0.1907 
0.1628 


0.9115 
0.6479 
0.5845 


11 


100 

1000 

2500 


0.1571 
0.0613 
0.0432 


0.1562 

0.0641 
0.0472 


0.1586 

0.0604 

0.0428 


0.2082 
0.0735 
0.0503 


0.4108 
0.1326 
0.0864 


0.1886 
0.0627 
0.0438 


0.2861 
0.0859 
0.0567 


0.4559 
0.1047 
0.0625 


12 


100 

1000 

2500 


0.2863 
0.1313 
0.0959 


0.2758 

0.1249 

0.0897 


0.2744 

0.1251 
0.09 


0.2937 
0.1267 
0.0903 


0.4023 
0.1471 
0.1019 


0.2986 
0.1272 
0.0907 


0.3684 
0.1416 
0.0992 


0.5874 

0.167 

0.1083 


13 


100 

1000 

2500 


0.3723 
0.1945 
0.151 


0.3627 

0.1816 
0.1362 


0.3647 
0.1808 

0.1363 


0.4048 
0.1862 
0.1373 


0.596 

0.2251 

0.1583 


0.3922 
0.1813 
0.1356 


0.4764 
0.1936 
0.1401 


0.707 

0.2213 

0.1627 


15 


100 

1000 

2500 


0.3072 
0.1668 
0.1375 


0.2946 
0.1464 
0.1114 


0.293 

0.1534 
0.1217 


0.305 

0.1415 

0.1088 


0.3876 

0.147 

0.1036 


0.3175 

0.1399 

0.1047 


0.38 

0.1482 

0.1041 


0.5593 
0.2257 
0.1742 


19 


100 

1000 

2500 


1.0525 
1.4559 
1.5706 


1.0695 
0.9915 
0.9773 


1.013 
1.107 
1.1507 


0.8101 
0.8598 
0.9181 


0.7173 
0.5813 
0.5981 


0.8545 
1.0131 
1.0842 


0.7274 
0.7704 
0.8361 


1.6579 
1.6414 
1.6392 


22 


100 

1000 

2500 


0.1979 

0.0804 

0.0566 


0.1931 

0.0896 
0.0649 


0.1985 
0.0832 
0.0577 


0.2477 
0.1003 
0.0683 


0.3981 
0.1531 
0.1015 


0.2423 
0.0949 
0.0641 


0.3232 
0.1299 
0.0856 


0.4385 
0.1027 
0.0644 


23 


100 

1000 

2500 


0.3829 
0.3498 
0.2035 


0.379 

0.2826 
0.2158 


0.3826 
0.2364 
0.1609 


0.4139 
0.3301 
0.2371 


0.5278 
0.3546 
0.3454 


0.4048 

0.274 

0.1813 


0.4592 

0.35 

0.27 


0.4858 

0.1392 

0.0922 


24 


100 

1000 

2500 


0.3775 

0.173 
0.1274 


0.4421 
0.2665 
0.2035 


0.4743 
0.2369 
0.1689 


0.6363 
0.3102 
0.2174 


0.8741 
0.4793 
0.3394 


0.5901 
0.2583 
0.1805 


0.7567 
0.3487 
0.239 


0.4773 

0.1695 

0.1247 


27 


100 

1000 

2500 


0.6238 
0.5965 
0.5339 


0.583 

0.5366 

0.4262 


0.5966 
0.4653 
0.3127 


0.6489 
0.5488 
0.4663 


0.7636 
0.5857 
0.5398 


0.6376 

0.52 

0.3493 


0.6985 
0.5337 
0.5147 


0.4514 
0.1682 
0.1275 



Table 1: Results of the simulation study: Estimated L\ risk for kernel estimators 



Devroye| ( |1989 ), which states that selecting the bandwidth using L2CV will lead to Li-inconsistent 



estimators for any sufficiently sharply peaked density is not applicable to density 19. However, 
Table [T] shows that L2CV performs even worse than most versions of the discrepancy principle in 
our simulations (with the versions choosing larger bandwidths doing relatively better). 

Overall, if the discrepancy principle is to be used for choosing a bandwidths, from the simula- 
tions it seems that V and E-LR would be the versions of choice. Although they perform similarly 
in the simulation study, there are good theoretical reasons for preferring E-LR as consistency 
can be guaranteed can be guaranteed for a large class of densities. Generally, the simulations 
show that the asymptotic results are of limited use for the sample sizes considered here (even for 
n = 2500!). This is not so much of a surprise since the asymptotic analysis is largely based on the 
law of the iterated logarithm. 



5 Conclusions 

The discrepancy principle is a fast and simple method of parameter choice that is also easy to 
implement. Although it is very popular in other branches of applied mathematics (namely in ill- 
posed problems theory), it has only rarely been used in density estimation. While there are many 
shortcomings - it is not optimal in any sense and it can even lead to inconsistent estimates for some 
densities with infinite peaks -, some variants do work surprisingly well for a large set of different 
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Density 


n 


L2CV 


E-LR 


V 


KS .5 


KS .95 


Kuip .5 


Kuip .95 


L2NR 


1 


100 


0.0774 


0.0726 


0.0755 


0.0939 


0.152 


0.0889 


0.117 


0.2511 




1000 


0.0205 


0.0248 


0.022 


0.0295 


0.0526 


0.0245 


0.0349 


0.0215 




2500 


0.012 


0.0156 


0.013 


0.0169 


0.0299 


0.0138 


0.019 


0.0122 


6 


100 


0.008 


0.008 


0.0082 


0.0129 


0.0336 


0.0111 


0.0203 


0.0529 




1000 


0.0014 


0.0016 


0.0014 


0.0021 


0.0055 


0.0016 


0.0027 


0.0028 




2500 


6e-04 


8e-04 


6e-04 


9e-04 


0.0024 


7e-04 


0.0012 


0.001 


11 


100 


0.0072 


0.007 


0.0069 


0.0104 


0.0332 


0.0088 


0.0177 


0.076 




1000 


0.0011 


0.0011 


0.001 


0.0014 


0.004 


0.0011 


0.0018 


0.0036 




2500 


6e-04 


6e-04 


5e-04 


7e-04 


0.0017 


6e-04 


8e-04 


0.0012 


12 


100 


0.0235 


0.0213 


0.0221 


0.0309 


0.0639 


0.0325 


0.0541 


0.1137 




1000 


0.0043 


0.0049 


0.0044 


0.0059 


0.0118 


0.006 


0.0103 


0.0063 




2500 


0.0022 


0.0027 


0.0022 


0.0029 


0.0058 


0.003 


0.0052 


0.0025 


13 


100 


0.039 


0.0379 


0.0394 


0.0511 


0.1017 


0.048 


0.0686 


0.1124 




1000 


0.0124 


0.015 


0.0137 


0.017 


0.0264 


0.0148 


0.0192 


0.0126 




2500 


0.0077 


0.0103 


0.0088 


0.011 


0.0169 


0.0094 


0.0122 


0.0076 


15 


100 


0.3402 


0.2946 


0.3034 


0.3663 


0.5308 


0.397 


0.5167 


0.7157 




1000 


0.1059 


0.1104 


0.1043 


0.122 


0.181 


0.1338 


0.1836 


0.1262 




2500 


0.0688 


0.075 


0.0682 


0.0784 


0.1128 


0.0862 


0.1168 


0.0788 


22 


100 


0.0124 


0.0119 


0.0125 


0.0181 


0.0337 


0.0176 


0.0263 


0.0725 




1000 


0.0021 


0.0028 


0.0023 


0.0036 


0.0081 


0.0032 


0.006 


0.0034 




2500 


0.001 


0.0015 


0.0011 


0.0017 


0.0039 


0.0014 


0.0027 


0.0013 


23 


100 


0.0536 


0.0546 


0.0544 


0.0572 


0.0768 


0.0561 


0.0639 


0.1051 




1000 


0.0485 


0.034 


0.0238 


0.0457 


0.0472 


0.0319 


0.0501 


0.0074 




2500 


0.0236 


0.0202 


0.011 


0.0244 


0.05 


0.014 


0.0317 


0.0033 


24 


100 


0.0444 


0.0547 


0.0597 


0.0856 


0.127 


0.0777 


0.1073 


0.0813 




1000 


0.0116 


0.028 


0.0239 


0.034 


0.0578 


0.0269 


0.0392 


0.0117 




2500 


0.006 


0.0202 


0.0153 


0.022 


0.0378 


0.017 


0.0248 


0.0075 


27 


100 


0.0205 


0.0198 


0.02 


0.0211 


0.0239 


0.0208 


0.0222 


0.0185 




1000 


0.0192 


0.0188 


0.0146 


0.0191 


0.0192 


0.0178 


0.0176 


0.0023 




2500 


0.0171 


0.0123 


0.007 


0.0146 


0.0181 


0.0085 


0.0174 


0.0014 



Table 2: Results of the simulation study: Estimated squared L2 risk for kernel estimators 



densities in simulations and consistency can - at least for some versions - be guaranteed for a 
large class of densities including all square-integrable ones. The simulations also show that the 
behavior of methods based on the discrepancy principle may be quite different from the asymptotic 
behavior even for sample sizes as large as n — 2500. Generally, asymptotic results do not help 
much in choosing the threshold function s(n) - the most striking example being the Li normal 
reference version L2NR which is guaranteed to asymptotically choose the L2 optimal bandwidth 
for the normal distribution but performs very poorly even when the true density is the normal. 
Also taking into account the inconsistency for some densities (a problem that is actually shared 
by many popular bandwidth selectors), the method cannot be recommended in general. 
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Density 


n 


L2CV 


E-LR 


V 


KS .5 


KS .95 


Kuip .5 


Kuip .95 


L2NR 


1 


100 

1000 

2500 


0.2267 
0.0651 
0.0381 


0.2206 
0.1073 
0.077 


0.2443 
0.0913 
0.0599 


0.3609 
0.1286 
0.0838 


0.6297 
0.2149 
0.139 


0.3342 
0.1055 
0.0663 


0.4784 
0.1502 
0.0946 


0.0292 
0.0363 
0.0277 


6 


100 

1000 

2500 


1.1911 
0.6465 
0.5264 


1.1651 
0.8374 
0.7162 


1.2968 

0.725 

0.5793 


1.9599 
0.9776 
0.7652 


3.6053 
1.4834 
1.1179 


1.7881 
0.8259 
0.644 


2.6064 
1.1175 
0.8551 


0.148 

0.2384 

0.2483 


8 


100 

1000 

2500 


0.0862 
0.0036 
0.0013 


0.0439 
0.0062 
0.003 


0.0511 
0.0046 
0.0018 


0.0946 
0.0087 
0.0035 


0.2425 
0.0232 
0.0092 


0.1193 
0.0108 
0.0043 


0.2323 
0.0223 
0.0088 


0.0061 

0.001 

4e-04 


11 


100 

1000 

2500 


0.9621 
0.5867 
0.4932 


0.9097 

0.711 

0.6253 


1.0118 
0.6137 
0.5052 


1.4927 
0.8288 
0.6677 


2.4632 
1.2229 
0.9607 


1.3745 
0.6992 
0.5581 


1.9095 
0.9389 
0.7385 


0.0977 
0.1732 
0.1942 


12 


100 

1000 

2500 


0.4389 
0.1945 
0.1456 


0.4423 
0.2675 
0.2163 


0.4865 
0.2374 
0.1815 


0.6981 
0.3063 
0.2293 


1.2262 
0.4542 
0.3285 


0.7238 
0.3081 
0.2326 


1.0752 
0.4212 
0.3108 


0.0654 
0.0996 
0.0994 


13 


100 

1000 

2500 


0.4289 
0.1219 
0.0723 


0.4419 
0.2108 
0.1507 


0.4889 
0.1791 
0.1167 


0.7277 
0.2533 
0.1641 


1.3762 
0.4253 
0.2737 


0.6792 
0.2072 
0.1308 


0.9784 
0.2962 
0.1871 


0.0684 
0.0716 
0.0538 


15 


100 

1000 

2500 


0.0903 
0.0147 
0.0077 


0.0631 
0.0208 
0.0134 


0.0703 
0.0173 
0.01 


0.1104 
0.0257 
0.0148 


0.2223 
0.0477 
0.027 


0.1286 
0.0304 
0.018 


0.2116 
0.0486 
0.0284 


0.0113 
0.0065 
0.0043 


19 


100 

1000 

2500 


0.0505 

4c-04 

lc-04 


0.0327 
0.0029 
0.0011 


0.0423 
0.0018 
5c-04 


0.1231 
0.0051 
0.0014 


0.6511 
0.0235 
0.0066 


0.0916 
0.0026 
7c-04 


0.263 

0.0077 

0.002 


0.0018 

le-04 

le-04 


22 


100 

1000 

2500 


0.8416 
0.4001 
0.3116 


0.8513 
0.5509 
0.4587 


0.9462 
0.4857 
0.3814 


1.3895 
0.6315 
0.4865 


2.3862 
0.9311 
0.6895 


1.3423 
0.5928 
0.4525 


1.937 

0.8078 

0.6014 


0.1067 
0.1784 
0.1844 


23 


100 

1000 

2500 


0.7869 
0.5907 
0.2641 


0.6109 
0.3161 
0.2485 


0.7071 
0.2601 
0.1956 


1.1304 
0.4032 
0.2701 


1.9354 
0.7562 
0.4789 


1.0402 
0.3033 
0.2151 


1.4977 
0.4961 
0.3065 


0.0718 
0.0945 
0.0917 


24 


100 

1000 

2500 


0.4771 
0.1068 
0.0604 


0.7096 
0.4089 
0.3038 


0.8014 
0.3443 
0.2328 


1.2902 
0.4997 
0.3314 


2.7506 
0.8546 
0.5653 


1.1366 
0.3915 
0.257 


1.815 

0.5795 

0.3731 


0.097 

0.1214 

0.1004 


27 


100 

1000 

2500 


5.7691 
4.7022 
2.9109 


3.876 

1.6839 

1.2055 


4.3972 
1.3201 
0.938 


6.8032 
2.1725 
1.3235 


12.1572 

3.9145 

2.405 


6.2664 
1.5474 
1.0177 


9.1695 
2.6182 
1.5117 


0.4075 
0.4851 
0.4597 



Table 3: Results of the simulation study: Chosen Bandwidth 
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